diagnostic performance
A Specialized Large Language Model for Clinical Reasoning and Diagnosis in Rare Diseases
Yang, Tao, Huang, Dandan, Lin, Yunting, Wu, Pengfei, Wu, Zhikun, Ma, Gangyuan, Lu, Yulan, Dong, Xinran, Li, Dingpeng, Ge, Junshuang, Zhang, Zhiyan, Huang, Xuanzhao, Nong, Wenyan, Zhou, Yao, Tang, Hui, Yang, Hongxi, Zhang, Shijie, Li, Juan, Cao, Xiaojun, Yang, Lin, Gao, Xia, Xu, Kaishou, Gu, Xiaoqiong, Zhang, Wen, Xia, Huimin, Liu, Li, Zhou, Wenhao, Li, Mulin Jun
W e assemble a large, domain - specialized clinical corpus and a clinician - validated reasoning set, and develop RareSeek - R1 via staged instruction tuning, chain - of - thought learning, and graph - grounded retrieval. Across multicenter EHR narratives and public benchmarks, RareSeek - R1 attains state - of - the - art accuracy, robust generalization, and stability under noisy or overlapping phenotypes. Augmented retrieval yields the largest gains when narratives pair with prioritized variants by resolving ambiguity and aligning candidates to mechanisms. Human studies show performance on par with experienced physicians and consistent gains in assistive use. Notably, transparent reasoning highlights decisive non - phenotypic evidence (median 23.1%, such as imaging, interventions, functional tests) underpinning many correct diagnoses. This work advances a narrative - first, knowledge - integrated reasoning paradigm that shortens the diagnostic odyssey and enables auditable, clinically translatable decision support.
Approaching Low-Cost Cardiac Intelligence with Semi-Supervised Knowledge Distillation
Zhou, Rushuang, Zhang, Yuan-Ting, Deen, M. Jamal, Dong, Yining
Deploying advanced cardiac artificial intelligence for daily cardiac monitoring is hindered by its reliance on extensive medical data and high computational resources. Low-cost cardiac intelligence (LCCI) offers a promising alternative by using wearable device data, such as 1-lead electrocardiogram (ECG), but it suffers from a significant diagnostic performance gap compared to high-cost cardiac intelligence (HCCI). To bridge this gap, we propose LiteHeart, a semi-supervised knowledge distillation framework. LiteHeart introduces a region-aware distillation module to mimic how cardiologists focus on diagnostically relevant ECG regions and a cross-layer mutual information module to align the decision processes of LCCI and HCCI systems. Using a semi-supervised training strategy, LiteHeart further improves model robustness under limited supervision. Evaluated on five datasets covering over 38 cardiovascular diseases, LiteHeart substantially reduces the performance gap between LCCI and HCCI, outperforming existing methods by 4.27% to 7.10% in macro F1 score. These results demonstrate that LiteHeart significantly enhances the diagnostic capabilities of low-cost cardiac intelligence systems, paving the way for scalable, affordable, and accurate daily cardiac healthcare using wearable technologies.
Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech
Tougui, Ilias, Zakroum, Mehdi, Ghogho, Mounir
Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.
CardioRAG: A Retrieval-Augmented Generation Framework for Multimodal Chagas Disease Detection
Shen, Zhengyang, Zhai, Xuehao, Tu, Hua, Shi, Mayue
Chagas disease affects nearly 6 million people worldwide, with Chagas cardiomyopathy representing its most severe complication. In regions where serological testing capacity is limited, AI-enhanced electrocardiogram (ECG) screening provides a critical diagnostic alternative. However, existing machine learning approaches face challenges such as limited accuracy, reliance on large labeled datasets, and more importantly, weak integration with evidence-based clinical diagnostic indicators. W e propose a retrieval-augmented generation framework, CardioRAG, integrating large language models with interpretable ECG-based clinical features, including right bundle branch block, left anterior fascicular block, and heart rate variability metrics. The framework uses vari-ational autoencoder-learned representations for semantic case retrieval, providing contextual cases to guide clinical reasoning. Evaluation demonstrated high recall performance of 89.80%, with a maximum F1 score of 0.68 for effective identification of positive cases requiring prioritized serological testing. CardioRAG provides an interpretable, clinical evidence-based approach particularly valuable for resource-limited settings, demonstrating a pathway for embedding clinical indicators into trustworthy medical AI systems.
Retrieval-Augmented VLMs for Multimodal Melanoma Diagnosis
Accurate and early diagnosis of malignant melanoma is critical for improving patient outcomes. While convolutional neural networks (CNNs) have shown promise in dermoscopic image analysis, they often neglect clinical metadata and require extensive preprocessing. Vision-language models (VLMs) offer a multimodal alternative but struggle to capture clinical specificity when trained on general-domain data. To address this, we propose a retrieval-augmented VLM framework that incorporates semantically similar patient cases into the diagnostic prompt. Our method enables informed predictions without fine-tuning and significantly improves classification accuracy and error correction over conventional baselines. These results demonstrate that retrieval-augmented prompting provides a robust strategy for clinical decision support.
An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis
Wang, Ming, Duan, Zhaoyang, Xue, Dong, Liu, Fangzhou, Zhang, Zhongheng
The labor-intensive nature of medical data annotation presents a significant challenge for respiratory disease diagnosis, resulting in a scarcity of high-quality labeled datasets in resource-constrained settings. Moreover, patient privacy concerns complicate the direct sharing of local medical data across institutions, and existing centralized data-driven approaches, which rely on amounts of available data, often compromise data privacy. This study proposes a federated few-shot learning framework with privacy-preserving mechanisms to address the issues of limited labeled data and privacy protection in diagnosing respiratory diseases. In particular, a meta-stochastic gradient descent algorithm is proposed to mitigate the overfitting problem that arises from insufficient data when employing traditional gradient descent methods for neural network training. Furthermore, to ensure data privacy against gradient leakage, differential privacy noise from a standard Gaussian distribution is integrated into the gradients during the training of private models with local data, thereby preventing the reconstruction of medical images. Given the impracticality of centralizing respiratory disease data dispersed across various medical institutions, a weighted average algorithm is employed to aggregate local diagnostic models from different clients, enhancing the adaptability of a model across diverse scenarios. Experimental results show that the proposed method yields compelling results with the implementation of differential privacy, while effectively diagnosing respiratory diseases using data from different structures, categories, and distributions.
An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning
Zamai, Andrew, Fijalkow, Nathanael, Mansencal, Boris, Simon, Laurent, Navet, Eloi, Coupe, Pierrick
The differential diagnosis of neurodegenerative dementias is a challenging clinical task, mainly because of the overlap in symptom presentation and the similarity of patterns observed in structural neuroimaging. To improve diagnostic efficiency and accuracy, deep learning-based methods such as Convolutional Neural Networks and Vision Transformers have been proposed for the automatic classification of brain MRIs. However, despite their strong predictive performance, these models find limited clinical utility due to their opaque decision making. In this work, we propose a framework that integrates two core components to enhance diagnostic transparency. First, we introduce a modular pipeline for converting 3D T1-weighted brain MRIs into textual radiology reports. Second, we explore the potential of modern Large Language Models (LLMs) to assist clinicians in the differential diagnosis between Frontotemporal dementia subtypes, Alzheimer's disease, and normal aging based on the generated reports. To bridge the gap between predictive accuracy and explainability, we employ reinforcement learning to incentivize diagnostic reasoning in LLMs. Without requiring supervised reasoning traces or distillation from larger models, our approach enables the emergence of structured diagnostic rationales grounded in neuroimaging findings. Unlike post-hoc explainability methods that retrospectively justify model decisions, our framework generates diagnostic rationales as part of the inference process-producing causally grounded explanations that inform and guide the model's decision-making process. In doing so, our framework matches the diagnostic performance of existing deep learning methods while offering rationales that support its diagnostic conclusions.
Tri-MTL: A Triple Multitask Learning Approach for Respiratory Disease Diagnosis
Kim, June-Woo, Lee, Sanghoon, Toikkanen, Miika, Hwang, Daehwan, Kim, Kyunghoon
Auscultation remains a cornerstone of clinical practice, essential for both initial evaluation and continuous monitoring. Clinicians listen to the lung sounds and make a diagnosis by combining the patient's medical history and test results. Given this strong association, multitask learning (MTL) can offer a compelling framework to simultaneously model these relationships, integrating respiratory sound patterns with disease manifestations. While MTL has shown considerable promise in medical applications, a significant research gap remains in understanding the complex interplay between respiratory sounds, disease manifestations, and patient metadata attributes. This study investigates how integrating MTL with cutting-edge deep learning architectures can enhance both respiratory sound classification and disease diagnosis. Specifically, we extend recent findings regarding the beneficial impact of metadata on respiratory sound classification by evaluating its effectiveness within an MTL framework. Our comprehensive experiments reveal significant improvements in both lung sound classification and diagnostic performance when the stethoscope information is incorporated into the MTL architecture.
PointExplainer: Towards Transparent Parkinson's Disease Diagnosis
Wang, Xuechao, Nomm, Sven, Huang, Junqing, Medijainen, Kadri, Toomela, Aaro, Ruzhansky, Michael
A B S T R A C T Deep neural networks have shown potential in analyzing digitized hand-drawn signals for early diagnosis of Parkinson's disease. However, the lack of clear inter-pretability in existing diagnostic methods presents a challenge to clinical trust. In this paper, we propose PointExplainer, an explainable diagnostic strategy to identify hand-drawn regions that drive model diagnosis. Specifically, PointExplainer assigns discrete attribution values to hand-drawn segments, explicitly quantifying their relative contributions to the model's decision. Its key components include: (i) a diagnosis module, which encodes hand-drawn signals into 3D point clouds to represent hand-drawn trajectories, and (ii) an explanation module, which trains an interpretable surrogate model to approximate the local behavior of the black-box diagnostic model. We also introduce consistency measures to further address the issue of faithfulness in explanations. Extensive experiments on two benchmark datasets and a newly constructed dataset show that PointExplainer can provide intuitive explanations with no diagnostic performance degradation. Introduction Parkinson's disease (PD) is one of the most prevalent neurological disorders worldwide, leading to a decrease in functional, cognitive, and behavioral abilities [1, 10]. Despite the unclear etiology and lack of a cure, evidence indicates that early diagnosis, coupled with subsequent neuroprotective interventions, can significantly delay its progression [53]. Hand drawing is a common but complex human activity, requiring fine motor control and involving a sophisticated interplay of cognitive, sensory, and perceptual-motor functions [14]. Dysgraphia is recognized as a crucial biomarker in the early stages of PD [39]. Digitized hand-drawn analysis [6, 26], as a noninvasive and easily accessible biometric technology, has emerged as a promising computer-aided approach for diagnosing PD [23, 30, 22, 72, 47, 21].
Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning
Sun, Zhoujian, Liu, Ziyi, Luo, Cheng, Chu, Jiebin, Huang, Zhengxing
Recent advances in large language models (LLMs) have shown promising results in medical diagnosis, with some studies indicating superior performance compared to human physicians in specific scenarios. However, the diagnostic capabilities of LLMs are often overestimated, as their performance significantly deteriorates in interactive diagnostic settings that require active information gathering. This study investigates the underlying mechanisms behind the performance degradation phenomenon and proposes a solution. We identified that the primary deficiency of LLMs lies in the initial diagnosis phase, particularly in information-gathering efficiency and initial diagnosis formation, rather than in the subsequent differential diagnosis phase. To address this limitation, we developed a plug-and-play method enhanced (PPME) LLM agent, leveraging over 3.5 million electronic medical records from Chinese and American healthcare facilities. Our approach integrates specialized models for initial disease diagnosis and inquiry into the history of the present illness, trained through supervised and reinforcement learning techniques. The experimental results indicate that the PPME LLM achieved over 30% improvement compared to baselines. The final diagnostic accuracy of the PPME LLM in interactive diagnostic scenarios approached levels comparable to those achieved using complete clinical data. These findings suggest a promising potential for developing autonomous diagnostic systems, although further validation studies are needed.